skip to main content


Search for: All records

Creators/Authors contains: "Kim, Seohyun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Textual data are increasingly common in test data as many assessments include constructed response (CR) items as indicators of participants' understanding. The development of techniques based on natural language processing has made it possible for researchers to rapidly analyse large sets of textual data. One family of statistical techniques for this purpose are probabilistic topic models. Topic modelling is a technique for detecting the latent topic structure in a collection of documents and has been widely used to analyse texts in a variety of areas. The detected topics can reveal primary themes in the documents, and the relative use of topics can be useful in investigating the variability of the documents. Supervised latent Dirichlet allocation (SLDA) is a popular topic model in that family that jointly models textual data and paired responses such as could occur with participants' textual answers to CR items and their rubric‐based scores. SLDA has an assumption of a homogeneous relationship between textual data and paired responses across all documents. This approach, while useful for some purposes, may not be satisfied for situations in which a population has subgroups that have different relationships. In this study, we introduce a new supervised topic model that incorporates finite‐mixture modelling into the SLDA. This new model can detect latent groups of participants that have different relationships between their textual responses and associated scores. The model is illustrated with an example from an analysis of a set of textual responses and paired scores from a middle grades assessment of science inquiry knowledge. A simulation study is presented to investigate the performance of the proposed model under practical testing conditions.

     
    more » « less
  2. null (Ed.)
    Growth mixture modeling is a popular analytic tool for longitudinal data analysis. It detects latent groups based on the shapes of growth trajectories. Traditional growth mixture modeling assumes that outcome variables are normally distributed within each class. When data violate this normality assumption, however, it is well documented that the traditional growth mixture modeling mislead researchers in determining the number of latent classes as well as in estimating parameters. To address nonnormal data in growth mixture modeling, robust methods based on various nonnormal distributions have been developed. As a new robust approach, growth mixture modeling based on conditional medians has been proposed. In this article, we present the results of two simulation studies that evaluate the performance of the median-based growth mixture modeling in identifying the correct number of latent classes when data follow the normality assumption or have outliers. We also compared the performance of the median-based growth mixture modeling to the performance of traditional growth mixture modeling as well as robust growth mixture modeling based on t distributions. For identifying the number of latent classes in growth mixture modeling, the following three Bayesian model comparison criteria were considered: deviance information criterion, Watanabe-Akaike information criterion, and leave-one-out cross validation. For the median-based growth mixture modeling and t -based growth mixture modeling, our results showed that they maintained quite high model selection accuracy across all conditions in this study (ranged from 87 to 100%). In the traditional growth mixture modeling, however, the model selection accuracy was greatly influenced by the proportion of outliers. When sample size was 500 and the proportion of outliers was 0.05, the correct model was preferred in about 90% of the replications, but the percentage dropped to about 40% as the proportion of outliers increased to 0.15. 
    more » « less
  3. null (Ed.)
    Selected response items and constructed response (CR) items are often found in the same test. Conventional psychometric models for these two types of items typically focus on using the scores for correctness of the responses. Recent research suggests, however, that more information may be available from the CR items than just scores for correctness. In this study, we describe an approach in which a statistical topic model along with a diagnostic classification model (DCM) was applied to a mixed item format formative test of English and Language Arts. The DCM was used to estimate students’ mastery status of reading skills. These mastery statuses were then included in a topic model as covariates to predict students’ use of each of the latent topics in their written answers to a CR item. This approach enabled investigation of the effects of mastery status of reading skills on writing patterns. Results indicated that one of the skills, Integration of Knowledge and Ideas, helped detect and explain students’ writing patterns with respect to students’ use of individual topics. 
    more » « less
  4. null (Ed.)
    Abstract The prebiotic synthesis of ribonucleotides is likely to have been accompanied by the synthesis of noncanonical nucleotides including the threo-nucleotide building blocks of TNA. Here, we examine the ability of activated threo-nucleotides to participate in nonenzymatic template-directed polymerization. We find that primer extension by multiple sequential threo-nucleotide monomers is strongly disfavored relative to ribo-nucleotides. Kinetic, NMR and crystallographic studies suggest that this is due in part to the slow formation of the imidazolium-bridged TNA dinucleotide intermediate in primer extension, and in part because of the greater distance between the attacking RNA primer 3′-hydroxyl and the phosphate of the incoming threo-nucleotide intermediate. Even a single activated threo-nucleotide in the presence of an activated downstream RNA oligonucleotide is added to the primer 10-fold more slowly than an activated ribonucleotide. In contrast, a single activated threo-nucleotide at the end of an RNA primer or in an RNA template results in only a modest decrease in the rate of primer extension, consistent with the minor and local structural distortions revealed by crystal structures. Our results are consistent with a model in which heterogeneous primordial oligonucleotides would, through cycles of replication, have given rise to increasingly homogeneous RNA strands. 
    more » « less
  5. The emergence of primordial RNA-based life would have required the abiotic synthesis of nucleotides, and their participation in nonenzymatic RNA replication. Although considerable progress has been made toward potentially prebiotic syntheses of the pyrimidine nucleotides (C and U) and their 2-thio variants, efficient routes to the canonical purine nucleotides (A and G) remain elusive. Reported syntheses are low yielding and generate a large number of undesired side products. Recently, a potentially prebiotic pathway to 8-oxo-adenosine and 8-oxo-inosine has been demonstrated, raising the question of the suitability of the 8-oxo-purines as substrates for prebiotic RNA replication. Here we show that the 8-oxo-purine nucleotides are poor substrates for nonenzymatic RNA primer extension, both as activated monomers and when present in the template strand; their presence at the end of a primer also strongly reduces the rate and fidelity of primer extension. To provide a proper comparison with 8-oxo-inosine, we also examined primer extension reactions with inosine, and found that inosine exhibits surprisingly rapid and accurate nonenzymatic RNA copying. We propose that inosine, which can be derived from adenosine by deamination, could have acted as a surrogate for G in the earliest stages of the emergence of life.

     
    more » « less
  6. Abstract

    Conventional assessment analysis of student results, referred to as rubric‐based assessments (RBA), has emphasized numeric scores as the primary way of communicating information to teachers about their students’ learning. In this light, rethinking and reflecting on not only how scores are generated but also what analyses are done with them to inform classroom practices is of utmost importance. Informed by Systemic Functional Linguistics and Latent Dirichlet Allocation analyses, this study utilizes an innovative bilingual (Spanish–English) constructed response assessment of science and language practices for middle and high school students to perform a multilayered analysis of student responses. We explore multiple ways of looking at students’ performance through their written assessments and discuss features of student responses that are made visible through these analyses. Findings from this study suggest that science educators would benefit from a multidimensional model which deploys complementary ways in which we can interpret student performance. This understanding leads us to think that researchers and developers in the field of assessment need to promote approaches that analyze student science performance as a multilayered phenomenon.

     
    more » « less